Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: P. Satyanarayana Goud, Dr. Panyam Narahari Sastry, Dr. P. Chandra Sekhar
DOI Link: https://doi.org/10.22214/ijraset.2023.55320
Certificate: View Certificate
ECG Examination is not limited to diagnosis of cardiovascular disease, but diagnosis of diseases and pathological conditions of a patient. There are numerous uses of ECG data in medical or surgical system namely preoperative / postoperative evaluation, drug efficacy evaluation, detection of side effects and health diagnosis, which can be found using ECG data. Hence, this is an open area of research. This work presents an ECG disease classification model using time and frequency domain features with machine learning models. The ECG signal base line wandering (typical noise) problem is solved by filtering. The time domain features such as Heart rate, RR intervals, QRS duration, Shannon entropy, Average Heart rate variability is used along with frequency domain features such as wavelet features, FFT features and DWT leader. These features are used for training in the proposed model. A SVM model is used to identify and classify the various heart diseases. In this work, two data sets have been considered to validate the proposed model, which are constructed from MIT-BIH website. The results of the proposed model are compared with the results of the existing algorithm. The recognition accuracy for proposed model is 97.95% for “Data set-1” and 94.47% for “Data set- 2”, which are better than the existing results.
I. INTRODUCTION
One of the leading causes of morbidity and mortality is the cardiovascular disease across the world. The patient’s life is reduced due to these diseases and further the cost incurred towards National Health Organisations increases. In addition to the prevalence and incidence, the cardiac arrhythmias with the clinical significance is increased that being associated with the population aging. As the Atrial Fibrillation (AF) is the type of sustained arrhythmia [1], it is more common for adults. It includes the significant growing trends specifically in the elder population or obesity disorders. Sometimes, AF is challenging task for diagnosis owing to the possible symptoms absence or the paroxysmal behaviours. The great interest towards the portable devices today as monitoring devices in the clinical settings in the researches. For providing the reliable AG diagnosis, the automatic techniques have been used to obtain the Electrocardiogram (ECG) signals [2] with portable devices. However, a challenge is still existed, especially if they are also considered other normal or pathological rhythms. The ECG shows information on cardiac electrical activity, revealing some of the arrhythmias [3]. Differences from a normal rhythm help to diagnose deviations in driving routes, enlargement of the heart muscle, hormonal imbalances or cellular ion channels, or even the appearance of a myocardial infarction. The most prominent complex on the ECG, called QRS [4], where R indicates the highest peak of the signal. This is why the R-R interval typically serves to indicate the rate at which the heart beats. On the other hand, the P wave indicates atrial activation, while the T wave indicates the repolarization of the ventricles. The Figure 1 shows a representation of the most important elements involved in the generation of the P-QRS-T complex of the ECG, or what is the same, during a heartbeat or cycle.
In adults, the most common sustained cardiac arrhythmia is the Atrial Fibrillation (AF), which is characterized by the uncoordinated atrial activation predominantly. It is based on the atrial mechanical function with the consequent deterioration [5].
In this regard, Medical Decision Support System (SADAM) provide specific and accurate knowledge for decision making during prognosis, diagnosis, treatments and patient management. The SADMs [6] are related to the concept of evidence-based medicine as they infer knowledge from clinical databases, to later assist in the diagnosis of new patients using the knowledge acquired.
The most complex SADMs are based on the use of Artificial Intelligence (AI) in medicine [7], specifically with the branch of Machine Learning, where algorithms are used to obtain predictive models from the data entrance clinics.
On the other hand, Pattern Recognition (PR) is as follows natural stage of AF, where the resulting predictive models assign a label or class to a set of input values from which it is initially unknown to which group belongs. In classification problems, these labels of classes were known to a priori.
In this research work, the most direct example of Pattern Recognition(PR) would be to determine whether one ECG signal corresponds to that of a person suffering from AF, any other pathological rhythm or conversely, belongs to an individual with a normal heart rhythm. Machine Learning [8] is a subset of techniques within the field of Artificial Intelligence that, through statistical methods, provides the computers of the ability to learn. This learning process consists of the progressive improvement of the performance in a specific task using only data in the form of observations or samples, thus providing models or data structures that are they progressively adapt to the input data. Current ECG data classification techniques report low accuracy for several classes of arrhythmias due to class conflict and class imbalance problems. The existing machine learning methods like KNN, Random Forest, Navies Bayes algorithm produce low accuracy. In this paper Multiclass SVM is proposed. This method uses frequency domain and time domain features were used to train the model. So, in test case this model produce better accuracy than existing methods.
The organization of this work is as follows: the section-II describes the literature survey, the proposed method is explained in section-III. The results are discussed in section-IV.
II. LITERATURE SURVEY
SK Pandey et al. [9] has introduced three ANN models that categorized into arrhythmia and healthy classes based on UCI repository ECG 12 lead signal extracted data features. Specifically, the ANN models have been used that are tested and trained on the Radial Basis Function (RBF), Recurrent Neural Network (RNN), and back-propagation feedforward neural networks. The testing results are evaluated based on the results of specificity, sensitivity and classification accuracy. The better diagnosis results have showed by the RNN models in terms of 83.1% testing accuracy of classification based on the chosen attributes among the contrast ANN models.
Eric Manibardo et al. [10] has reviewed the 4214 AED rhythm analysis Electrocardiogram (ECG) and the rhythm is annotated while using the consensus decision as the ground truth. A total of 22 VT, 472 VF, 1009 PEA, 294 PR, and 2418 AS were included. For developing the automatic rhythm annotator, the intervals of ECG analysis were extracted and used to partition the patient wise data into the test (30%) and training (70%). However, the performance results were evaluated based on F-score (F1) and per class sensitivity (Se). Here, the global performance parameters were used as the unweighted mean of sensitivity (UMS) and F-score. By using the random forest classifier and the stationary wavelet transform of ECG, the denoising stage and feature extraction included in the classification technique. As per thythm Se/F1, the best model presented as 81.9/72.2, 94.2/96.1, 85.3/81.3,43.3/52.2,95.8/95.7 for VT, VF, PEA, PR and AS respectively. For the test set, the UMS was obtained as 80.2% while the above 2-points that of previous solutions. The large datasets of OHCA annotate by this method retrospectively and the manual annotation of OHCA rhythm workload is ameliorated.
Namrata Singh et al. [11] has presented the model to diagnose the cardiac arrhythmias based on the selection of best features using the filter based feature selection techniques with the integration of three different ML techniques for cardiac arrhythmia dataset. The crucial pre- processing step is the feature selection that determine the responsible factors for patients who suffering from arrhythmia. The health factors of patients could be examined as it is powerful predictor for heart related deaths.
To assess the feature selection methods performance, three types of ML techniques, termed as JRip, random forest, and linear SVM were incorporated. The highest accuracy of 85.58% has been achieved using the random forest classifier based on gain ratio feature selection by analysing the experimental results for a subset of 30 features.
MU Khan et al. [12] has proposed a technique of signal processing for predicting the disease of Coronary artery based on raw ECG signals of 9 to 12 minutes. Firstly, the raw recording of ECG is pre-processed and segmented based on the chosen intrinsic mode function (IMF) 2-5 and the Empirical Mode Decomposition (EMD). The data is classified by features, like Higuchi Fractal Dimension, Quantile, Energy Entropy, Spectral Entropy, Root sum square, Energy, Marginal Factor, Impulse Factor, shape factor, Kurtosis and Skewness. The pre- processed signal feds into the SVM classifier.
The accuracy of 95.5% is achieved by the system using the self-collected data. The cardiologists will take assistance from the proposed system to make the effective decisions of treatment.
Saeed Mian Qaisar et al. [13] has contributed the researches on developing the efficient multirate ECG automated detectors of arrhythmia computationally. An integration of wavelet decomposition and multirate denoising is used for realizing the wireless implants of ECG effectively. The mining of decomposed signal sub-band features is performed and the mature classifier of K-Nearest Neighbour (KNN) classifier is used for diagnosing the arrhythmia. The system’s processing activity reduces substantially by the multirate nature and a dramatic reduction is allowed in the energy consumption by comparing with the existing methods.
III. PROPOSED METHOD
Flow chart of proposed method is shown in Figure 2 and block diagram is shown in Figure 3. Flow chart shows the overall flow of the work will be done in 3 stages.
A. Baseline Wandering (BW) Correction
The baseline wandering (BW) can be captures by the coarse approximation Aj0=∑kaj0,k when the proper wavelet function and resolution level j0€(1,….,J) have been chosen. The BW will remove by subtracting the part from the raw signal of ECG.
An open question is included the selection of right wavelet for a particular application, the wavelet is selected for correction of ECG BW that resembles the characteristic and significant waveform QRS of the ECG signal.
After subtracting the Ajo from the raw ECG, the interesting detail features can be kept and captured the details Σj=j0 Dj . Without any over-smoothing, the BW can captured by Aj0 for selection j0. The maximum number of decomposition levels determined based on the signal length.
B. Feature Extraction Techniques
The objective of the feature extraction stage is to transform the segment of the signal to be analysed, in such a way that the relevant clinical information is obtained with reduced number of coefficients. In this way, it is possible to represent the signal in a space whose metric minimizes the distance between patterns of the same class and maximizes the distance between patterns of different classes.
There are two types of features are extracted from the ECG signal. Those are i) Frequency domain features ii) Time domain features.
1) Frequency Domain Features
The Frequency domain features extracted from wavelet transform and FFT.
a) Wavelet Transformed (WT) Characteristics
The WT decompose the signal into its different spectral components, in such a way that each of these has a resolution according to its scale. The function of the real variable t is known as the mother wavelet function and must oscillate in time, in addition to being well located in the time domain. The scale parameter “a” is associated with a stretching or shrinking of the parent function. The translation parameter b allows the temporal location of the energy distribution. From the parent function, the wavelet function are generated through joint operations of change of scale and translation, in the form as equation (1)
C. Shannon Entropy
The average amount of information (entropy) is measure of how much information the source is producing. In physics, the word entropy often appears, but its meaning refers to concepts such as “randomness”, “irregularity”, and “ambiguity”. Information theory refers to the exact same concept, meaning that the more irregular the information, the more information it carries on average.
Suppose that two alphabets A and B are output randomly. In this way, the information source that the alphabet does not depend on the past and is output independently is called a memoryless information source. Each of the probability PA and PB, the average amount information given by equation (3) given below:
SVM models were defined as binary classifiers (two classes to be labelled), where there is no selection of any decision threshold in their output, since the classification is carried out by labelling each sample on one side or the other of the created hyperplane is shown in Figure 7. To convert SVM into a classifier of more than two classes, in this work the strategy “one-vs- one” or “one against one” was followed, where it is necessary to train a total of C(C-1)/2 binary classifiers in a problem with C classes. With the present case three classes, the trained total of three classifiers and implemented the corresponding voting system among the outputs of each of the models to classify a sample. At each point in the SVM hyper parameter search network, the same values were used to train the six binary classifiers, thus avoiding a combinatorial explosion in unapproachable practice.
On the other hand, a variable C can be handled by SVM models for allowing some flexibility that allows the compensation controls between the rigid margins and training errors. A soft margin is created by allowing some errors in the classification at the same time. Once it penalizes them thus, the hyper parameter C represents the compromise between the size of the margin and the number of misclassification. With everything described above, it can be concluded that the performance of the SVM depends on the Kernel function used, its parameters and the margin penalty parameter C.
IV. RESULTS AND DISCUSSION
A. Data set-1
To evaluate the different methodologies of the participants of the competition, the basis of data was divided into two data sets: training and testing. The set of training consisted of 113 records. On the other hand, the test set contained 49 recordings of lengths and distributions of three classes similar to training. The dataset1 has 3 different classes, those are Cardiac Arrhythmia (ARR), Normal Sinus Rhythm (NSR) and Congestive Heart Failure (CHF). The process of generating a model for pattern recognition based on Machine learning is divided into two main phases: training and recognition. During the training phase a data set is used for build the model is called training set. It is in this phase where an adaptive model is adjusted to obtain the best possible generalization, and of that way to resolve new cases during the recognition phase. Once the model is ready, it is possible to incorporate in a computer system to identify and classify new observations.
1) SVM Training and Validation
The following describes the experimentation performed with support type models vector Machine, which consisted, as in previous models, in the training and cross-validation of classification performance in sets of selected data. In both experiments the same search was used in the different ones hyper parameters which are detailed below.
As described above and SVM is classified by the hyper plane that maximizes the margin between two classes in the data training. The vectors that define this hyper plane, selected from independent predictors or variables are the so called ‘support vectors’.
The machine learning algorithm predicts each element in the verification set, clarifies whether it is negative or positive, and then classifies all elements into the following four categories based on the prediction and the label of the gold standard: True Negative (TN), True Positives (TP), False Positives (FP) and False Negatives (FN).
In this work the diseases, Cardiac Arrhythmia (ARR), Normal Sinus Rhythm (NSR), and Congestive Heart Failure (CHF) were considered for “Data set-1” and Cardiac Arrhythmia (ARR), Atrial Fibrillation (ATF), Malignant Ventricular Entropy (MVE), Normal Sinus Rhythm (NSR), and Supra Ventricular Arrhythmia (SVA) were considered for “Data set-2” from MIT-BIH data base. The confusion matrix for both training and testing are found using SVM classifier. The recognition accuracy obtained is found to be 96.41% in the proposed method for “Data set-1” which is in line with existing random forest classifier. However, this result is far better than the existing KNN classifier. The recognition accuracy obtained is 94.47% for “Data set-2” which is better than random forest classifier and KNN classifier. The proposed method gave the superior recognition accuracy compared to existing KNN classifier for both “Data set-1” and “Data set-2” . The precision, recall and F1-score values of the proposed algorithm are 97.23%, 96.29% and 96.59% respectively for “Data set-1” and 96%, 95.36% and 95.58% for “Data set-2” respectively. The future scope of this work can be a hybrid classifier consisting of two different classification techniques.
[1] Linz. Dominik. Adrian D. Elliott. Mathias Hohl. Varun Malik, Ulrich Schotten, Dobromir Dobrev, Stanley Nattel et al. “ Role of automatic nervous system in atrial fibrillation.”. International Journal of Cardiology 287(2019):181-188 [2] Turker Tuncer, Sengul Dogan, Pawel Plawiak, and U.Rajendra Acharya, “Automated Arrhythmia detection using novel hexadecimal local pattern and multilevel wavelet transform with ECG signals”, Knowledge based systems 186(2019):104923 [3] S.Sahoo, M.Dash, S.Bahera and S.Sabut, “ Machine learning approach to detect cardiac arrhythmia in ECG signals:A survey”, IRBM41, no4(2020):185-194. [4] Jagdeep Rahul, Marpe Sora and Lakhan Dev sharma, “Exploratory data analysis based efficient QRS complex detection technique with minimal computational load”, Physical and Engineering science in Medicine 43, no.3(2020):1049-1067. [5] Vignesh Kalidas and Lakshman S.Tamil, “Detection of atrial fibrillation using discrete –state Markov models and Random Forests”, Computers in biology and medicine 113(2019):103386. [6] Samia Sbissi, Mariem Mathfoudh and Said Gattoufi, “A medical decision support system for cardiovascular disease based on ontology learning”, in 2020 International multi-conference on “Orgaization of knowledge and Advanced Technologies”(OCTA), pp.1-9, IEEE-2020. [7] Zhanquan Sun, Chaoli Wang, Yangyang Zhao and Chao Yan, “Multi label ECG signal classification based on ensemble classifier”, IEEE access8(2020):117986-117996. [8] Zhaoyang Ge, Zhihua Zhu, Panpan Feng, Shuo Zhang, Jing Wang and Bing Zhou. “ECG signal classification using SVM with multi feature”, in 2019, 8th International Symposium on Next Generation Electronics(ISNE). Pp.1-3, IEEE,-2019. [9] Saroj Kumar Pandey, and Rekh Ram Janghel, “ECG arrhythmia classification using artificial neural networks”, In proceedings of 2nd International conference on communication, computing and networking, pp.645-652, Springer, Singapore-2019. [10] Eric Manibardo, Unai Irusta, Javier Del Ser, Elisabete Aramendi, Iraia, Mikel Olabarria, Carlos Corcuera, Jose Veintemillas, and Andima Larrea, “ECG based random forest classifier for cardiac arrest rhythms”, in 2019, 41st International conference of the IEEE Engineering in Medicine and Biology society(EMBC), pp.1504-1508, IEEE-2019. [11] Namrata singh and Pradeep singh, “ Cardiac arrhythmia classification using machine learning techniques”, in Engineering vibration, Communication and Information Processing, pp.469-480, Springer, Singapore,2019. [12] Muhammad Umar Khan, Sumair Aziz, Syed Zohaib Hassan Naqvi and Abdul Rehman, “Classification of Coronary Artery diseases using Electrocardiogram signals”. In 2020, International conference on Engineering trends in smart technologies(ICETST), pp.1- 5, IEEE.2020. [13] Saeed Mian Qaisar, Moez Krichen and Fatma Jallouli, “Multirate ECG processing and k-nearest neighbor classifier based efficient arrhythmia diagosis’, in International conference on smart homes and health telematics, pp.329-337, Springer, Cham, 2020. [14] B.Venkataramanaiah and J.Kamala, “ECG signal processing and KNN classifier based abnormality detection by VH-doctor for remote cardiac health care monitoring”, Soft computing, 24, no.22(2020):17457-17466. [15] Sudestna Nahak and Gouthm Saha, “A fusion based classification of normal, arrhythmia and congestive heart failure in ECG”, in 2020 National conference on Communications(NCC),pp.1-6, IEEE,2020
Copyright © 2023 P. Satyanarayana Goud, Dr. Panyam Narahari Sastry, Dr. P. Chandra Sekhar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET55320
Publish Date : 2023-08-12
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here